Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures
نویسندگان
چکیده
Individual privacy will be at risk if a published data set is not properly de-identified. k-anonymity is a major technique to de-identify a data set. A more general view of k-anonymity is clustering with a constraint of the minimum number of objects in every cluster. Most existing approaches to achieving k-anonymity by clustering are for numerical (or ordinal) attributes. In this paper, we study achieving k-anonymity by clustering in attribute hierarchical structures. We define generalisation distances between tuples to characterise distortions by generalisations and discuss the properties of the distances. We conclude that the generalisation distance is a metric distance. We propose an efficient clusteringbased algorithm for k-anonymisation. We experimentally show that the proposed method is more scalable and causes significantly less distortions than an optimal global recoding k-anonymity method.
منابع مشابه
Anonymizing classification data using rough set theory
Identity disclosure is one of the most serious privacy concerns in many data mining applications. A wellknown privacy model for protecting identity disclosure is k-anonymity. The main goal of anonymizing classification data is to protect individual privacy while maintaining the utility of the data in building classification models. In this paper, we present an approach based on rough sets for m...
متن کاملOn Enhancing Data Utility in K-anonymization for Data without Hierarchical Taxonomies
K-anonymity is the model that is widely used to protect the privacy of individuals in publishing microdata. It could be defined as clustering with constrain of minimum k tuples in each group. K-anonymity cuts down the linking confidence between sensitive information and specific individual by the ration of 1/k. However, the accuracy of the data in k-anonymous dataset decreases due to informatio...
متن کاملA Privacy Protection Model for Patient Data with Multiple Sensitive Attributes
The identity of patients must be protected when patient data are shared. The two most commonly used models to protect identity of patients are L-diversity and K-anonymity. However, existing work mainly considers data sets with a single sensitive attribute, while patient data often contain multiple sensitive attributes (e.g., diagnosis and treatment). This article shows that although the K-anony...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملA Clustering Approach for Achieving Data Privacy
New privacy regulations together with everincreasing data availability and computational power have created a huge interest in data privacy research. One major research direction is built around k-anonymity property and its extensions, which are required for the released data. In this paper we present such an extension to k-anonymity, called psensitive k-anonymity, which solves some of the weak...
متن کامل